Cooke County
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
Hu, Yuanzhe, Wang, Yu, McAuley, Julian
Recent benchmarks for Large Language Model (LLM) agents primarily focus on evaluating reasoning, planning, and execution capabilities, while another critical component-memory, encompassing how agents memorize, update, and retrieve long-term information-is under-evaluated due to the lack of benchmarks. We term agents with memory mechanisms as memory agents. In this paper, based on classic theories from memory science and cognitive science, we identify four core competencies essential for memory agents: accurate retrieval, test-time learning, long-range understanding, and selective forgetting. Existing benchmarks either rely on limited context lengths or are tailored for static, long-context settings like book-based QA, which do not reflect the interactive, multi-turn nature of memory agents that incrementally accumulate information. Moreover, no existing benchmarks cover all four competencies. We introduce MemoryAgentBench, a new benchmark specifically designed for memory agents. Our benchmark transforms existing long-context datasets and incorporates newly constructed datasets into a multi-turn format, effectively simulating the incremental information processing characteristic of memory agents. By carefully selecting and curating datasets, our benchmark provides comprehensive coverage of the four core memory competencies outlined above, thereby offering a systematic and challenging testbed for assessing memory quality. We evaluate a diverse set of memory agents, ranging from simple context-based and retrieval-augmented generation (RAG) systems to advanced agents with external memory modules and tool integration. Empirical results reveal that current methods fall short of mastering all four competencies, underscoring the need for further research into comprehensive memory mechanisms for LLM agents.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Texas > McMullen County (0.04)
- North America > United States > Texas > Cooke County (0.04)
- Asia > China > Hong Kong (0.04)
Entropy-Reinforced Planning with Large Language Models for Drug Discovery
Liu, Xuefeng, Tien, Chih-chan, Ding, Peng, Jiang, Songhao, Stevens, Rick L.
The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms of likelihood for molecule generation. However, relying solely on LLM decoding often results in the generation of molecules that are either invalid due to a single misused token, or suboptimal due to unbalanced exploration and exploitation as a consequence of the LLMs prior experience. Here we propose ERP, Entropy-Reinforced Planning for Transformer Decoding, which employs an entropy-reinforced planning algorithm to enhance the Transformer decoding process and strike a balance between exploitation and exploration. ERP aims to achieve improvements in multiple properties compared to direct sampling from the Transformer. We evaluated ERP on the SARS-CoV-2 virus (3CLPro) and human cancer cell target protein (RTCB) benchmarks and demonstrated that, in both benchmarks, ERP consistently outperforms the current state-of-the-art algorithm by 1-5 percent, and baselines by 5-10 percent, respectively. Moreover, such improvement is robust across Transformer models trained with different objectives. Finally, to further illustrate the capabilities of ERP, we tested our algorithm on three code generation benchmarks and outperformed the current state-of-the-art approach as well. Our code is publicly available at: https://github.com/xuefeng-cs/ERP.
- Europe > Austria > Vienna (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Texas > McMullen County (0.04)
- (2 more...)
Foreign survivors of brutal Hamas attack on Israel recall terror massacre : 'Everything was burning'
JERUSALEM – For Mitchai Sarabon, a Thai fieldhand working on Kibbutz Alumim in southern Israel, Oct. 7 started like any other Saturday. His one day off a week, the 32-year-old said, he woke early and began doing his laundry. His friends – a mix of Thai migrant workers and Nepalese agricultural students – were also milling about the compound where they lived on the edge of the kibbutz, taking care of various personal tasks, when suddenly they heard gunshots. "Suddenly, I saw one of the Nepalese guys being shot, others ran to hide in a bomb shelter and then the terrorists arrived," Sarabon recounted to Fox News Digital in a video interview from his home in Udon Thani, Thailand, on Friday. "They threw a grenade inside, some of the people died instantly and others ran away, they were shot dead too."
- Asia > Middle East > Israel > Southern District (0.26)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.25)
- Asia > Thailand > Udon Thani > Udon Thani (0.25)
- (6 more...)
- Media > News (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
- Government > Regional Government > Asia Government > Middle East Government > Palestine Government (0.48)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization
Kristiadi, Agustinus, Immer, Alexander, Eschenhagen, Runa, Fortuin, Vincent
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while its efficacy has been studied in large-scale tasks like image classification, it has not been studied in sequential decision-making problems like Bayesian optimization where Gaussian processes -- with simple mean functions and kernels such as the radial basis function -- are the de-facto surrogate models. In this work, we study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility. However, we also present some pitfalls that might arise and a potential problem with the LLA when the search space is unbounded.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
A Robust Bias Mitigation Procedure Based on the Stereotype Content Model
Ungless, Eddie L., Rafferty, Amy, Nag, Hrichika, Ross, Björn
The Stereotype Content model (SCM) states that we tend to perceive minority groups as cold, incompetent or both. In this paper we adapt existing work to demonstrate that the Stereotype Content model holds for contextualised word embeddings, then use these results to evaluate a fine-tuning process designed to drive a language model away from stereotyped portrayals of minority groups. We find the SCM terms are better able to capture bias than demographic agnostic terms related to pleasantness. Further, we were able to reduce the presence of stereotypes in the model through a simple fine-tuning procedure that required minimal human and computer resources, without harming downstream performance. We present this work as a prototype of a debiasing procedure that aims to remove the need for a priori knowledge of the specifics of bias in the model.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (8 more...)
'Nothing to do, nowhere to go': What happens when elephants live alone
On a raw December day, as Christmas music blares over loudspeakers, an African elephant named Asha walks in tight circles in an enclosure at Natural Bridge Zoo, a roadside attraction in Virginia. Her living quarters consist of a barn and three outdoor yards--a fenced patch of grass about 90 by 40 feet, a dirt patch with a few logs scattered about, and a yard where she gives rides to children for $15 and her massive feet have worn a ring into the grass. Her space is barren--no shrubs, trees, or watering holes. Elephants, like humans, are social animals. In the wild, females typically live in herds of eight or more, yet Asha, who's nearly 40 years old, has been confined mostly alone for more than 30 years.
- North America > United States > Virginia (0.25)
- North America > United States > Vermont (0.05)
- North America > United States > Texas > Cooke County > Gainesville (0.04)
- (8 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Law Enforcement & Public Safety (0.96)
- (4 more...)
An Inconsistency-Tolerant Approach to Information Merging Based on Proposition Relaxation
Schockaert, Steven (Ghent University) | Prade, Henri (Université Paul Sabatier)
Inconsistencies between different information sources may arise because of statements that are inaccurate, albeit not completely false. In such scenarios, the most natural way to restore consistency is often to interpret assertions in a more flexible way, i.e. to enlarge (or relax) their meaning. As this process inherently requires extra-logical information about the meaning of atoms, extensions of classical merging operators are needed. In this paper, we introduce syntactic merging operators, based on possibilistic logic, which employ background knowledge about the similarity of atomic propositions to appropriately relax propositional statements.
- North America > United States > Florida > Alachua County > Gainesville (0.05)
- North America > United States > Georgia > Hall County > Gainesville (0.05)
- North America > United States > New York (0.04)
- (4 more...)